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Von Neumann projections are the main operations by which information can be extracted from 
the quantum to the classical realm. They are however static processes that do not adapt to the 
states they measure. Advances in the field of adaptive measurement have shown that this limitation 
can be overcome by “wrapping” the von Neumann projectors in a higher-dimensional circuit which 
exploits the interplay between measurement outcomes and measurement settings. Unfortunately, 
the design of adaptive measurement has often been ad hoc and setup-specific. We shall here develop 
a unified framework for designing optimized measurements. Our approach is two-fold: The first is 
algebraic and formulates the problem of measurement as a simple matrix diagonalization problem. 

The second is algorithmic and models the optimal interaction between measurement outcomes and 
measurement settings as a cascaded network of conditional probabilities. Finally, we demonstrate 
that several figures of merit, such as Bell factors, can be improved by optimized measurements. This 
leads us to the promising observation that measurement detectors which—taken individually—have 
a low quantum efficiency can be be arranged into circuits where, collectively, the limitations of 
inefficiency are compensated for. 
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rithm, Neumark’s theorem 


Contents 


I. Introduction [T] 

II. Adaptive measurement as an algebraic 

problem [2] 

A. An intuition for generalized measurements [2] 

B. Adaptation as a similarity transformation [3] 

1. Direct measurements [3] 

2. Generalized measurements H] 

C. Adaptation as a dynamic problem [B] 

III. Adaptive measurement as an algorithmic 

problem [B] 

A. Generalized measurements as superoperators [6] 

B. The tree data structure [7] 

G. Optimization algorithms [7] 

IV. Figures of merit and simulations [8] 

A. Figures of merit [5] 

1. Distinguishability [5] 

2. Mean min-to-max ratio [8] 

3. Error probability [5] 

4. Bell factor [B] 

5. Orthogonality [9] 

B. Numerical simulations [B] 

1. Hadamard rotation and APD [ini 

2. Goherent displacement and APD [TT] 

3. Homo dyne detection [TT] 

4. Distinguishability as a function of M and C fTT] 

G. Gomputational considerations [TT] 

V. Discussion and outlook [TBl 

Acknowledgments [TT] 


A. Superoperators 

M 

References 

[IB] 


I. INTRODUCTION 

Even within the physics community, the idea of mea¬ 
surement all too often evokes specific laboratory devices, 
such as photon counters or homodyne detectors, to name 
a couple of examples from quantum optics. In other 
words, we are accustomed to reducing measurements to 
algebraic projections which are static in Hilbert space. 
For instance, the “field of view” of a photon counter is 
immutably constrained along the diagonal of the Fock 
Hilbert space and cannot be redirected to peek at the off- 
diagonal terms which conceal potentially valuable phase 
information. This type of basic measurement is called 
a direct measurement or, alternatively, a von Neumann 
projection [T]. 

Over the past few decades, several measurement 
schemes have been independently developed which have 
successfully overcome the limitation of direct measure¬ 
ments. These schemes, variously referred to as quantum 
receivers HO, quantum filtering measurements [1H7], or 
adaptive measurements [SHia, have demonstrated that 
passive detection devices can be augmented by multi¬ 
modal quantum circuits so as to gain optimal insight into 
the states under scrutiny. Although they differ in their 
motives, these advances in quantum measurement all 
have in common that they incorporate von Neumann pro¬ 
jections into larger setups involving ancillary resources, 
controllable unitary operations, and (most often) some 
Bayesian logic governing feedback loops between the de¬ 
tection outcomes and the unitary operations. We shall 
interchangeably refer to these collective techniques as 
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generalized—or adaptive—measurements. 

Although the theory of generalized measurements is 
well established, notably through the work of Neumark 
m and Kraus it remains under-exploited in the 
design of experiments. Indeed, most of the experimental 
advances cited above were arrived at in an ad hoc fashion 
where heuristic approaches and setup-dependent models 
overlooked the bigger picture offered by Neumark’s the¬ 
orem. In all cases, the overarching goal can be stated as 
follows: Given the limited toolbox of detection devices 
and unitary operations that is readily available in the 
laboratory, how can one design—in a systematic way—a 
quantum measurement circuit that optimizes the rele¬ 
vance and accuracy of the acquired data? 

In the present article, we will show that a structured 
solution to this problem can be obtained on two fronts. 
The first one is algebraic: In Sec. |llj we develop an intu¬ 
ition for Neumark’s theorem which captures the essence 
of adaptive measurements. This will lead us to argue that 
the stochastic mindset which has so far dominated the 
theory of adaptive measurements [EHn] is at the outset 
underpinned by a deterministic, albeit non-trivial, alge¬ 
braic problem. We will show that this algebraic formula¬ 
tion consists of finding a similarity transformation from a 
multi-mode sequence of direct measurements to an opti¬ 
mal positive-operator valued measure (POVM) in higher 
dimensions. The second aspect we shall bring forth is 
a computational, or algorithmic, one. This is what we 
treat in Sec. |III| where we represent generalized measure¬ 
ments as Bayesian networks which, when laid out in a 
optimum way, can mimic the statistics of the aforemen¬ 
tioned optimal POVM. Even if this approach does in¬ 
clude conditional probabilities, its formulation is rather 
straightforward and does not resort to the stochastic ma¬ 
chinery of quantum trajectories or master equations. Sec¬ 
tion discusses in detail the various figures of merit by 
which the efficiency of generalized measurements can be 
assessed. A selection of numerical simulations will be 
presented to illustrate the various trends of these figures 
of merit. It will then become apparent that, for any 
given detection device, what we usually think of as the 
limitations of quantum efficiency are really those of a 
direct measurement setup. However, if several such in¬ 
efficient devices are judiciously assembled into a larger 
measurement circuit, the collective quantum efficiency is 
improved even if the building blocks, taken individually, 
are inefficient. 

Before proceeding, let us clarify what is meant by the 
optimization of measurements. In quantum information, 
measurement is not only an end in itself, but can also be 
a means of computation and state preparation. Different 
figures of merit can therefore be subjected to optimiza¬ 
tion such as distinguishability measures |18) . discrimina¬ 
tion errors. Bell factors, state fidelity, etc. We shall intro¬ 
duce some of these figures of merit with a particular focus 
on the single shot discrimination of quantum states. In¬ 
deed, we argue that the complete characterization of a 
state is really just a generalized discrimination problem 


where the set of possible states to be distinguished is 
infinite. Hence, any measurement is intrinsically a com¬ 
parative operation that presupposes a pool of candidate 
states. 


II. ADAPTIVE MEASUREMENT AS AN 
ALGEBRAIC PROBLEM 

A. An intuition for generalized measurements 

Let us first develop a basic intuition for generalized 
measurements before formally presenting the problem 
of quantum state discrimination. Assume we are inter¬ 
ested in discriminating a square from a circle which are 
drawn on a two-dimensional space such as a sheet of pa¬ 
per. From within the paper, both figures will appear 
as straight lines; their discrimination will therefore be 
impossible. However, if we step up to the third dimen¬ 
sion, we will immediately be able to distinguish them 
even if their projection onto the detector—our eye— 
remains effectively two-dimensional. This is the essence 
of Neumark’s theorem: By rising to higher dimensions 
in Hilbert space, we have the potential to recover infor¬ 
mation which was otherwise traced out or “de-cohered” 
in the reduced space containing the system of interest 
[Milo]. This said, it does not suffice to go up to higher 
dimensions to implement an efficient generalized mea¬ 
surement. If instead of comparing a circle and a square, 
we intend to compare an isoceles trapezoid and a square, 
not only will we have to include a third dimension, but 
the vantage point (in this case the Euclidean angle) from 
that third dimension will also have to be chosen carefully 
or else both figures may again be indistinguishable due 
to coinciding perspectives. This is especially crucial if 
we are constrained to a limited number of vantage points 
while trying to maximize the information gain about the 
measured objects. One could go about this problem in 
a stochastic way. E.g., we could start from one random 
vantage point and then, depending on what we “see”, 
move in one direction or another so as to gradually in¬ 
crease the confidence in the discrimination. (This is the 
heuristic behind the Dolinar and Kennedy receivers Hi 
as well as most of the subsequent schemes of adaptive 
measurement.) However, it is clear that the very nature 
of the problem is deterministic and the sequence of opti¬ 
mal vantage points can in principle be solved for exactly 
based solely on the (Hilbert) geometry of the states at 
play. 

To the admittedly naive analogy above, one should add 
the extra complication that in quantum mechanics, the 
very act of measurement reshapes the geometrical figures 
(i.e., states) under observation—as dictated by the un¬ 
certainty principle. This fact introduces a uniquely quan¬ 
tum twist to our story: The optimal vantage points are 
not only based on the Hilbert space configuration of the 
objects to be distinguished, they are also interdependent 
and chronologically ordered. This will become clearer as 
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we move on to a formal statement of the problem. 

B. Adaptation as a similarity transformation 

Consider a pool of C candidate states G 
where c G {I,-’’ jC’}; which are contained in a Hilbert 
space . No assumption is made as to purity or the 
mutual orthogonality of these states, except that they are 
normalized and that no two of them are exactly identical. 
Without loss of generality, we can assign to each state c 
a prior probability that it be retrieved from the pool. 
Physically, these probabilities could represent the classi¬ 
cal rate of incidence of the states onto the measurement 
apparatus. For completeness, we have 

Y.Pc^ = 1 - ( 1 ) 

C=1 

Furthermore, assume that there exists M possible out¬ 
comes p G {I,-- - at the end of the measurement 

[50]. The prior probabilities p^c'^ will be redistributed 
among the different outcomes p, thereby creating proba¬ 
bility distribution functions pi^\p) such that 

M 

=^PcHf^)- ( 2 ) 

tJ.=l 

The parenthesized superscript indicates whether the 
probabilities pertain before (0) or after (1) the measure¬ 
ment. 

Ideally, we want the measurement operation to be a in¬ 
jective map m from the set of candidate quantum states 
{c} to that of the classical readouts {p}. In other words, 
perfectly unambiguous discrimination is only possible if 
each readout p is mapped by at most one candidate state 
c while each candidate state c maps to at least one read¬ 
out p. (Such a non-overlapping, one-to-many mapping 
incidentally requires that M > C.) In general, however, 
this ideal condition is not likely to be met: The same 
outcome p may be mapped by more than one state c 
with varying probabilities, thereby introducing ambigu¬ 
ity in the discrimination. We are then faced with an op¬ 
timization problem where the goal is to determine with 
highest confidence the classical identity c of the unknown 
state. Since quantum states only manifest themselves 
to us through the probability distributions they cast on 
the measurement detectors, we shall claim that any two 
states are optimally discriminated if their probability dis¬ 
tributions p^P{p) are as dissimilar as possible. A more 
rigorous definition of “dissimilarity” will be provided in 
Sec. |I^ as one of the potential figures of merit F rele¬ 
vant to quantum measurements. For now, let us denote 
by G the ideal POVM which produces these 
maximally dissimilar probability distributions or, more 
specifically, which maximizes i.e. 

Ofj^ = argmaxJ^. (3) 


In quantum metrology, for example, could be any 
POVM which reaches the Heisenberg limit. (Note that 
any such ideal POVMs are guaranteed to exist as demon¬ 
strated in Refs. |53|53|.) 

Recall that Born’s rule reads 

pi^\p) = Pc°^ ■ Tr |6^/5 c6];| , (4) 

where we have propagated the prior probability , and 
that Ofj, satisfies completeness 

M 

Y.d^ = l (5) 

The measurement schemes we will devise shall strive 
to mimic the ideal POVM or at least reproduce the 
probability distributions it generates with as much h- 
delity as possible. We shall see see how generalized mea¬ 
surements perform better at this task than direct mea¬ 
surements. Let us first introduce the latter with a very 
generic notation that we shall maintain throughout the 
rest of this article. 


1. Direct measurements 



FIG. 1: Generic representation of a direct measurement. The 
state is rotated by a unitary operation Ar parametrized by 
T and then collapsed by a projective measurement corre¬ 
sponding to one of M possible readouts p. The empty set 
symbol 0 indicates that the input state is irreversibly “con¬ 
sumed” by the end of the measurement at which point no 
further information can be retrieved. 

A direct measurement is represented generically in Fig. 
12 The unknown candidate state undergoes a unitary 
operation At G which can be tuned by a param¬ 
eter (or set of parameters) r. This parameter could be 
any degree of freedom available to us in the laboratory, 
such as a coherent displacement, a squeezing factor, or 
the rotation angle induced by a set of wave plates. It is 
by tuning r that we can search for the optimal vantage 
points discussed earlier. After this, the transformed state 
is collapsed by one of M possible von Neumann projec¬ 
tors H^ G to each of which corresponds a classical 
output p G {!,■ ■ ■ , M}. If we assume that the detection 
device never fails to produce an output, the projectors 
should add up to unity 

M 

^fi^ = i. (6) 
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Inconclusive outcomes or failures in detection can be ac¬ 
counted for by allocating a fictitious outcome ^faii among 
the M outputs. 

The probability that a state c from the pool of candi¬ 
date states triggers a readout ^ is given by 



where is the probability that a certain outcome /r is 
obtained given that state c was incident (i.e., notwith¬ 
standing its prior probability). This conditional proba¬ 
bility is given by 

= ( 8 ) 

Now that we have an expression for the probability 
distributions projected by the candidate states on the 
spectrum of outcomes, there remains to insure that these 
distributions are maximally discriminated. This can be 
achieved by tuning the controllable parameter t such 
that the combination of Ar and 11^ mimics as much as 
possible the statistics produced by the ideal POVM O. 
By comparing Eq. ^ with Eq. Q, we therefore need 
to solve for the optimal Tq which best approximates a 
pseudo-similarity transformation from the von Neumann 

projector to the ideal POVM 

Vm- (9) 

Conceptually, what this transformation does in Hilbert 
space is to align the set of candidate states p'c'^ with the 
von Neumann projectors H^. The unitary thus serves 
to present the states into a more revealing configuration 
in Hilbert space. In practice, however, because of the 
limited leeway offered by r, a strict equality in Eq. (§ 
will be unlikely. Therefore, we may as well give a more 
operational statement of the problem whereby we search 
for the argument of the maximum for the figure of merit 


all measurement efficacy. The flexibility afforded by the 
parameter r can be used to somewhat adjust the orienta¬ 
tion of the quantum states with respect to the projection 
operator. This leeway is nonetheless constrained by the 
very nature of the unitary operations Aj- as well as their 
availability in our laboratory toolbox. Although it is al¬ 
ways possible conceive of a better unitary A^ [551 US] i it 
may not exist physically or may simply be too demand¬ 
ing to engineer. This is where Neumark’s theorem comes 
in. By rising to higher dimensions in Hilbert space, the 
measurement setup can be made even more flexible—i.e., 
adaptive—while still exploiting the same available build¬ 
ing blocks of At and H^. This is achieved by coupling 
the unknown candidate state with N known ancillary 
states pine € where fcG {I,-- - ,N}. The parenthe¬ 
sized superscript labels the quantum modes: The zeroth 
mode is occupied by the input and all the ancillaries span 
modes I to N. The coupling of all V -|- 1 modes, i.e., the 
ancillaries plus the unknown input, could be achieved by 

a beam splitting operation B € 0^0 each out¬ 

put of the beam splitters, one then grafts the same direct 
measurement described in Sec. mm This multiplexed 
arrangement (Fig. of direct measurements will pro¬ 
duce N classical readouts—one from each mode—which 
we shall bundle into an array of length N 


Pi = 



,(fc) 



( 11 ) 


where I G {l,-- - uniquely identifies one set of 

outcomes among all the possible combinations. 

What we shall consider from now on is therefore the 
probability distribution mapped by the candidate states 
c onto the outcomes p;. Just as in Eq. Q, these proba¬ 
bility distributions are given by 

( 12 ) 


To = argmax (10) 

rer 

where T is the parameter space, e.g., [—7r,7r] for polar¬ 
ization rotations, or C for coherent state translations. 

2. Generalized measurements 

We have just seen how a von Neumann projector can be 
amended with a unitary operation to improve the over¬ 


Here again, the parenthesized superscript over the proba¬ 
bilities label the number of completed measurements: (0) 
indicates prior probabilities whereas (iV) indicates that 
all N measurements have been completed. (These labels 
should not be confused with the somewhat related super¬ 
scripts over the density matrices of the ancillary states 
and their Hilbert spaces: Those indicate the quantum 
modes.) 

The conditional probability that a certain outcome se¬ 
quence Pi is triggered by a state c is given by 
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FIG. 2: A generalized measurement is represented schematically as the multi-mode augmentation of a direct measurement. 
Whereas a direct measurement only spans the modes populated by the candidate states, a generalized measurement involves 
states and operations in ancillary modes, which—when chosen appropriately—provide a higher-dimensional perspective on the 
measured state. Here, we show the candidate state in the zeroth mode (the horizontal quantum channel) being coupled with 
known ancillary states in the modes 1 to (the vertical quantum channels). Taken individually, each ancillary mode then 
undergoes the same direct measurement process as described in Fig. In contrast to the direct measurement, the generalized 
setup provides a finer-grained projection space for the probability distributions function with possible outcomes (as opposed 
to only M). Similarly, the generalized setup provides not just one, but N degrees of freedom for tuning the unitary parameters 
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where we have assembled the coupling operator B and 
the operations into one big unitary 


Uf = B^ 


N 


(14) 


The A^-dimensional array r G represents a combi¬ 
nation of parameter settings for the unitary operations 
at each mode 


r(l) 






(15) 


Similarly, we have grouped the von Neumann projections 
and the input states into single matrices, indicated by a 
tilde, and spanning all iV -|- 1 modes: 


If we draw the parallel between Eq. (13) and its di¬ 
rect measurement analog, Eq. ([^, we see that we can 
again tweak the parameters r to approximate a pseudo¬ 
similarity transformation akin to that of Eq. © 




I I 


(18) 


where is the multi-mode counterpart of the ideal 
POVM conceived of in Eq. (j^. Alternatively, the 
parameter setting Tq which maximizes the measurement 
figure of merit can be found by optimization such that 

To = argmax (19) 


TV 


Bp = I( 




k=l 


T<»> = T<“> 


N 




lk=l 


(16) 

(17) 


Note that since the ancillary states pine are known and 
initially independent of the candidate states pi°^, the in¬ 
formation content of the augmented state is exactly 
the same as that of pi°^. 


We can already see that a generalized measurement 
offers a two-fold advantage over its direct counterpart. 
The first is that the cardinality of the projection space, 
i.e., the set of classical outcomes, increases from M to 
thereby opening up the possibility of discriminat¬ 
ing more states than would be possible if C > M. More¬ 
over, the increased range of classical readouts allows for 
a crisper resolution of the probability distributions. This 
can in principle be valuable in reducing their overlap, 
and therefore in reducing the ambiguity of the discrim¬ 
ination. (Recall that perfect discrimination requires an 
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injective mapping from the set of candidate states c to 
that of outcomes /r.) The second advantage of general¬ 
ized measurements is that they provide us with, not one, 
but N “tuning knobs” r. If we add to this the choice of 
N ancillary states, it becomes clear that the generalized 
setup offers much more leeway to prepare the candidate 
states before they are irreversibly collapsed by the von 
Neumann projections. 


C. Adaptation as a dynamic problem 


So far, we have presented the optimization of mea¬ 
surement as a deterministic, one-off calculation based on 
what we know about the Hilbert space geometry of the 
input states and the projectors. This culminated with 
two algebraic formulations of the problem, namely Eq. 

and Eq. ( |I^ . One could be content with this un¬ 
derstanding of measurement optimization as the search 
for an optimal—but static—vantage point To EH- How¬ 
ever, there exists yet a third and more crucial difference 
between direct and generalized measurements. Whereas 
wave function collapse occurs at once in the former, it 
has the potential to happen gradually in the latter. With 
each partial collapse up to mode fc, the pool of candidate 

(k') 

states is reshaped; hence, the optimal vantage points To 
at the remaining modes k' > k have to be shifted accord¬ 
ingly. (This process is referred to as quantum jumps in 
some of the literature on quantum diffusion [T3J[T3; the 
only difference here is that the observer acts as the bath.) 
This gradual updating of Tq based on the history of out¬ 
comes fl means that, in effect, measurement optimization 
can operate not only in Hilbert space, but also in time. 
As the gradual collapse takes place, fj, and /r will grow 
in tandem, with fl lagging behind To by one element. In 
the next section we shall model the relationship between 
the optimal parameters Tq and the history of outcomes fl 
with a simple Bayesian network that can be used to infer 
the identity of the candidate states. 


III. ADAPTIVE MEASUREMENT AS AN 
ALGORITHMIC PROBLEM 


Although Eqs. (18) and (191 are self-contained, solving 


them for To analytically is non-trivial. It may therefore 
be more practical to resort to numerical methods. These 
methods simulate all possible combinations of outcomes 
fl and, for each such combination, perform a parameter 
sweep over the elements of r so as to maximize the fig¬ 
ure of merit IF of interest. We shall see how this can be 
tackled by different algorithms which can be either local- 
optimal or global-optimal, depending on the development 
cost one is willing to allocate to the problem. These algo¬ 
rithms have in common that they give shape to Bayesian 
networks which can subsequently be used by experimen¬ 
talists as look up tables whereby each setting is 

linked to the history of outcomes fl^^'^ = • • • , . 


Let us first introduce the notion of superoperator with 
which the networks are most conveniently traversed. 


A. Generalized measurements as superoperators 



FIG. 3: Representation of the generalized measurement setup 
of Fig. [^as a recursive two-mode superoperation. The super¬ 
operator could be thought of as a black box that spans the 
Hilbert space of the input state. This black box features 
a classical control knob for r, a classical readout for p, as well 
as a quantum input port for which is output as pi^\ 

The transmission of the beam splitter can in princi¬ 
ple be incorporated into as yet another unitary degree of 
freedom. 


We have concluded Sec. [IT] by noting that, under a 
gradual collapse of the wave function, one can further 
exploit the adaptation of the parameters Tq to the history 
of outcomes fl. Gradual collapse is however unwieldy to 
treat with multi-modal matrices such as those of Eq. (13) 


as that would require the cumbersome nesting of partial 
traces. A better solution, which readily lends itself to 
implementation, is to confine the whole problem to the 
Hilbert space of the zeroth mode and recursively update 
the candidate states and their probability distributions 
upon each collapse of the ancillary mode. This recursion, 
schematized in Fig. transforms the candidate states 
from one collapse to the next via a superoperator [24] : 


= 


[/ 


= E 




\m)K,,„pf-^^Kl., (20) 


i,n,m—0 


where the Kraus operators are given by 


= (*l Ayn^(fc)A^(f=)^ B^(k)\j). (21) 


Note that in both Eqs. (20) and , the Dirac notation 
pertains to the ancillary mode. A detailed derivation of 
the superoperation is given in Sec. [A| 


The multi-mode expression for Born’s rule, Eq. (13), 


can be re-expressed as N recursive superoperations 
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whereby the probability distributions up to the fcth an¬ 
cillary mode collapse are given by 


where 




(fc) rji 

= Tr 


...$w 


do) 


( 22 ) 


}• (23) 


B. The tree data structure 


We now have all the tools to present the Bayesian net- 

ffc') 

work that governs the probability distributions pc under 
a gradual collapse scenario. This network is best repre¬ 
sented by a class probability tree [25]. The tree data 
structure is indeed an increasingly favored choice in the 
represention of the information-theoretic flow in many 
quantum processes (cf. Refs. [5SHSII)- In our case, each 
node in the tree stores the density matrices of the can¬ 
didates states their probabilities and the 

values of the unitary parameters and From each 
node emanate M edges which correspond to the M pos¬ 
sible outcomes of the direct measurement. This pattern 
is iterated all the way to the depth N of the tree, where 
each leaf node takes on a unique label Z € {ij • ’ ’ ) 

The root node, which contains the input candidate states 
and their prior probabilities, lies at level fc = 0. Finally, 
the sequence of outcomes jli leading up to the Zth leaf is 
called a branch. The tree data structure is illustrated for 
the case of TV = 4 and M = 2 in Fig. [^ 

Note that in order not to overload the notation, we 
have not specified any label to uniquely identify the nodes 
within a given level. Notational rigor should however re¬ 
quire all parameters pertaining to any given node to be 
labeled by the coordinates (fc, v) where fc £ {1, • • • , N} is 
the level in the tree and G {1, • ’ ’ > } is the horizon¬ 

tal position of the node at that level. This more rigorous 
notation is exemplified in the inset of Fig. [^ 

Ideally, the goal of a generalized measurement is to 
have any given leaf mapped by exactly one candidate 
state c, or at least to minimize the overlap of the pro¬ 
jected probability distributions at the leaves. Only then 
will the figures of merit be maximized. (We shall re¬ 
turn to the exact definitions of the figures of merit in 
the next section.) As we saw in the previous section, we 
can determine the sequence r which satisfies Eqs. (18) 
or (19). In this case, all the unitary parameters in the 


nodes at a given level fc will be equal. Although this rep¬ 
resents an improvement over the direct measurement, it 
does not take advantage the chronology of the outcomes 
For that, even within a given level, the 
parameters at each node will have to be adapted to 
the particular shape (in Hilbert space) that the candidate 
states have inherited from previous measurement. Let us 
next explain how the parameters can be optimized 
for each node. 



FIG. 4: In a gradual collapse scenario, the relationship be¬ 
tween the history of outcomes and the unitary parameters is 
best represented by a tree structure where each ramification 
corresponds to the possible outcomes of the direct measure¬ 
ment. At each node, one needs to determine the parameters 
To*‘\ (The entire data structure is in this sense a decision 
tree.) Here, we show the case of a direct measurement which 
only has M = 2 possible outcomes—e.g., an avalanche photo 
diode or a homodyne detector whose quadrature readouts are 
partitioned into two complementary ranges. The candidate 
states undergo N — 4 de-localizations such that, in the end, 
the whole setup presents = 16 outcomes. Any given run 
will traverse one these 16 distinct branches pi with a probabil¬ 
ity of In this case we have highlighted the outcome 

1 = 6 whose sequence is pe = [0,1, 0,1]. With such an a pos¬ 
teriori knowledge, one can then infer backwards what state c 
was most likely to have been input. In order to illustrate the 
labeling in our notation, the inset presents the node marked 
by an asterisk. 


C. Optimization algorithms 


One of the simplest methods determines the optimal 
parameters Tq on the fly: At each kth partial measure- 
ment, it tries to maximize the figure of merit by perform¬ 
ing a parameter sweep over £ T. (This is known as 
a greedy algorithm |32] whereby optimization abides by 
a short-term, maximum-gain policy.) In practice, one 
could proceed with a so-called pre-order traversal: One 
first determines the parameter at the current node 
and then recursively visits the children nodes from, say, 
left to right. At the end, all internal nodes will have been 
assigned a value for To^\ In an experimental context, 
each time a partial measurement p^^^ is recorded, 
is looked up from the tree and fed-forward to the next 
measurement. As the sequence of outcomes p is gradu¬ 
ally acquired, one could even perform a “live update” on 
the confidence of having identihed a certain state c with 
a maximum likelihood estimation based on the probabil¬ 
ities p^(fc)|c- 

Though relatively simple to program and often as ef¬ 
ficient as global optimization methods |2S|, greedy algo¬ 
rithms run the risk of getting stuck at local optima. In- 











































(k) 

deed, the parameters Tq are determined locally at each 
node in a top-down manner from root to leaf. A truly 
global algorithm, on the other hand, would not just per¬ 
form a parameter sweep over the parameter range T one 
node at a time, but would probe all possible combinations 
of T over all internal nodes. Due to the exponential 

growth of the tree with its depth N, such a global param¬ 
eter sweep is likely to be numerically demanding. Among 
the global optimization methods, dynamic programming 
is one of the most tractable candidates [a [111132] as it 
avoid redundancies in the parameter sweep of the de¬ 
cision trees while still probing all combinations. Sev¬ 
eral hybrid heuristics also exist which combine global- 
and local-optimal performance. Many of the established 
techniques from machine learning could be relevant in 
this regard [HI |33] ■ One such technique—particle swarm 
optimization—has incidentally been applied to phase es¬ 
timation by Hentschel and Sanders in Ref. [H] 


IV. FIGURES OF MERIT AND SIMULATIONS 

In order not to clutter the previous discussions, we 
have so far only referred to the figures of merit symbol¬ 
ically as T. Let us now define some of them in detail 
so as to examine the performance trends of generalized 
measurement. 


A. Figures of merit 

As we briefly mentioned in the introduction, a mea¬ 
surement can always be reduced to a discrimination prob¬ 
lem: If we measure some state p € H and completely 
characterize it, we are in a sense discriminating it from 
everything else in H that it is not. The pool of candidate 
states is in that case infinite. If we however discretize the 
pool of candidate states, we are back to the discrimina¬ 
tion problem we have treated so far. We shall therefore 
surmise that an optimal measurement is that which best 
distinguishes the elements in a given pool of possibili¬ 
ties. Experimentally, the only evidence we have to go by 
when distinguishing quantum states c is the probability 
distribution functions Pc{p) they cast on the measure¬ 
ment spectrum. It is then a natural choice to start with 
figures of merit J- that depend on pdp)- 


1. Distinguishability 

Fuchs [m 133] provides a thorough survey of discrim¬ 
ination measures. Let us adapt some of them to our 
purposes. The first, which we shall generically refer to 
as “distinguishability” and denote by D, is based on the 
Bhattacharyya coefficient [55H57] 

BC(pc,Pc') = E \/pc{x)pc'{x) e [0,1], (24) 


between two normalized probability distributions pdx) 
and Pc'{x). This coefficient, which is just an inner prod¬ 
uct of two functions of x, quantifies their similarity in the 
same way that a dot product quantifies the overlap of two 
vectors. In our case, we are dealing with a pool of C dif¬ 
ferent probability distributions. We therefore propose to 
define a Bhattacharyya coefficient which averages out the 
similarity between all the possible pairs in the pool 

C 

BC(pi,--• ,pc) = P^c^p‘'^^^G{jPc,Pc'). (25) 

C,C^ = 1 
c^c' 


Note that the averaging took into account the prior prob¬ 
abilities of the candidate states. Since we are in¬ 
terested in distinguishability rather than similarity, we 
shall use a modification of the Bhattacharyya coefficient 
referred to as the Bellinger distance and defined as 


HD(pi,pj) = - BC(p^,Pj). 


(26) 


Putting together Eqs. ( [25| ) and (26), we define distin¬ 
guishability for our purposes as 


V = 1 

\ 


C M" 

^ ^ VpApi) -Pc'ipi). 

c,c' — l I—I 

c^c' 


(27) 


It should be clear that although we have designed this 
figure of merit to be as comprehensive as possible, it is 
by no means better than any of the other ones described 
in Refs. [T31131] . 


2. Mean min-to-max ratio 


Another figure of merit which quantifies the overlap 
of two probability distributions is the mean ratio of the 
minimum-to-maximum probability distributions 


M" 


^ = EE Pc{pi) 


1=1 c=l 


min {pc' (m;)} 

c' 

max{pc"(w)}' 


(28) 


Unlike I?, which is to be maximized, we should aim 
to minimize TZ. This definition of the mean ratio of 
minimum-to-maximum probabilities, though intuitive, 
cannot be easily applied to cases where C > 3. We shall 
therefore only apply to pools of candidate states contain¬ 
ing two elements only. 


3. Error probability 

Both D and TZ stem from an algebraic, rather than an 
operational, rationale. A more operational figure of merit 
would be the discrimination error i.e., the probability 
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of mistaking one state c for another c' ^ c and vice versa. whose probabilities of incidence are equal |. 

Just like TZ, £ should be minimized. It is given by Ideally, this discrimination is done by the POVMs 


£-=pWPe,(c)+p(?^P,(c'), (29) 


where Pc{c') is the probability that the generalized mea¬ 
surement identifies state c' as c [38]. Ideally, we should 
have Pc{c) = Pc'{c') = 1 and Pc{c') = Pc'{c) = 0 for 
c ^ c'. These identification probabilities are given by 


Pc{c') = Y. 


(N) 

5- I / 

Aijc' 


(30) 


I^jCc 


where is expressed in Eq. (231. Cc indicates the 

set of leaves I most likely to be attained by state c. In 
other words, if state c is most likely to traverse branch 
pi, then we assign leaf I to Cc- If we limit ourselves to 
two candidate states pi and p 2 , we will have 


I £ Cl ^ Piifli) > P 2 {pi), (31) 

I € C 2 P 2 ipi) > piipi). (32) 


"1 l‘ 

2 2 

1 1 

, and O 2 = 

1 l" 

2 2 

1 1 

2 2 


2 2 


(35) 


In practice, however, these ideal projectors can at best 
be approximated with the superoperators we obtained 
from our generalized measurement. This is achieved by 
bundling all the sequences of superoperations that are 
most likely to be projected on by a given state such that 


Oi « ^ [pi], and 62 ^ Y [^ 2 ], (36) 


IG£-i 


IGC2 


where Ci and £2 were defined in Eqs. (311 and (321. $ 


represents the N nested superoperations a 
Su, •••$«. 


ong branch fli: 

(37) 


If pi{pi) = P 2 {pi), the sequence pi will correspond to an 
inconclusive outcome. Note that, for completeness, £1 
and £2 are non-overlapping and their cardinalities add 
up to , i.e., the total number of leaves. Further¬ 
more, although we have only expressed £ for the case of 
two candidate states only, it can easily be generalized to 
an average of pairwise discrimination errors, in a similar 
fashion to what we did with the Bhattacharyya coefficient 
in Eq. (25). 


4- Bell factor 


Quantum measurements are not merely ends in them¬ 
selves, but are often part of a broader process or compu¬ 
tation. In fundamental physics, investigations of nonlo¬ 
cality are one such application. Although Bell tests are 
based on measurement, it is not the identification or the 
discrimination of quantum states that is their primary 
purpose. Rather, it is the Bell factor B, a statistical dis¬ 
crepancy between the classical and quantum correlations 
of space-like events, which is the prime figure of merit. 
Though indirectly related, B and £ are not necessarily op¬ 
timized by the same configuration of measurement. The 
Bell factor we shall consider is that which can be pro¬ 
duced by a tri-partite W state, which we shall express in 
Fock basis as 

|W) = ^ (|0,0,1) + |0,1,0) + |1,0,0)). (33) 

The full details of the Bell setup and the exact expression 
for B are provided in Ref. jSS] so we shall not reproduce 
them here. It suffices to know that the Bell test rests on 
our ability to distinguish, at one or more modes of the 
W state, the two candidate states 


Pi 


| 0 ) + | 1 ) 
V 2 


and p 2 = 


| 0 ) - |i) 


(34) 


Finally, let us recall that for the particular case of the 
W state, a positive B is indicative of a violation of Bell’s 
inequality and the larger its positive amplitude, the more 
decisive is the violation. 


5. Orthogonality 

Let us conclude the presentation of figures of merit by 
introducing the orthogonality £l of two states pi and p 2 , 
which we define as the complement of hdelity 


^2(pi,/ 52) = 1 - Tr{pip 2 }- (38) 

Some previous work, particularly by Takeoka et al, 
has presented orthogonality as a central criterion for the 
adaptive discrimination of quantum states [4nil43j . If 
we assume that all the candidate states in the pool are 
initially orthogonal r2(/5-°\/5^°^) = 1 , Vf 7 ^ j, it indeed 
makes sense to insure they remain orthogonal after k par¬ 
tial measurements such that Cl{p^^\ p^Y = 1, VA: > 0. 
Any non-orthogonality that is acquired during the grad¬ 
ual collapse of the states will represent a fundamental 
and irrecoverable loss of distinguishability. Though intu¬ 
itive, the requirement of orthogonality does not have any 
operational motive in itself. In fact, it leads to singular¬ 
ities in the optimization of the unitary parameters 
if the candidate states are the Fock qubits of Eq. (34) 
[45] . The utility of orthogonality as a figure of merit shall 
therefore be de-emphasized in our simulations. 


B. Numerical simulations 

In this section, we shall present some trends in the 
figures of merit V, B, TZ and £ under various general¬ 
ized measurement configurations. The pool of candidate 
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states consists of C qubits equally spread along the real¬ 
valued longitudinal cross-section of the Bloch sphere such 
that 


cos(f)^ cos (I) sin (I) 
sin(§)cos(|) sin(l)^ 


and , 


(39) 

where 9 = and c S {!,•' ’ jC'}- In the special 

case of C = 2, we are left with the two states of Eq. 

Two types of operations Ar shall be considered. The 
first is a Hadamard rotation, defined in Fock space as 


where Ug is the photon count at which the PNRD sat¬ 
urates such that M = Ug -|- 1. (An APD is a PNRD 
which already saturates at rig = 1 in that it cannot dif¬ 
ferentiate one photon from more than one.) The last von 
Neumann projector we shall use is a homodyne detector 
(HD) which bins the real quadratures x into positive and 
negative values 

jj(HD) _ / |2;((a;| (40) 

J —oo 

pOO 

= / |a;)(a;|da;. (47) 

Jo 


^(Had.) ^ 


COs(y)^ 


(40) 


where t € 'T = [— tt, tt]. The second operation is a coher¬ 
ent displacement 


^pis.) ^ (41) 

where a and aJ are the creation and annihilation oper¬ 
ators, respectively. (An explicit expression for 
acting on Fock states is given in Refs. [HI SS].) The 
parameter range for the displacements shall be confined 
to the real segment T = [—1,1]- 
Because the simulations are numerical, we only probe 
the parameter ranges T at a finite number Sj- S N of 
sample points. We chose = 40 and = 10 

equidistant points for the Hadamard angle and coherent 
displacement, respectively. Furthermore, in the search 
for the optimal parameter Tq at any given mode, the 
density of the initial mesh of parameters is doubled (i.e., 

S'! -2 X S'r) so long as the figure of merit does not 

converge at a certain satisfactory rate. 

Three types of von Neumann projections H^ shall be 
simulated. The first is an imperfect on-off click detector 
such as an avalanche photo diode (APD). The two pos¬ 
sible outcomes (M = 2), no-click and click, are given in 
Fock space by 

OO 

n^APD) = ^ (1 - r?)” |n)(n|, and (42) 

n—0 

j^(APD) ^ i_n(APD)^ (-43^ 

respectively, where rj G [0,1] is the quantum efficiency 
|38j . A fine-grained generalization of the APD is the 
photon number resolving detector (PNRD), 


PV(PNRD) 


fV(PNRD) 


9^ 1 ^ {l-vT 

n=/i-l J 

(44) 

ns 

i-Enr™’. («) 

M' = l 


Finally, we shall set the multi-modal beam splitter B 
to de-localize the candidate states equally into N modes 
such that the transmission of the zeroth mode at the fcth 
splitting is given by 


fik) 


N-k 
N-k + 1' 


(48) 


For simplicity, all ancillary modes shall remain empty by 
setting = |0)(0|, \/k. 

Note that unless stated otherwise, all the results pre¬ 
sented below were obtained using greedy algorithms 
whereby the distinguishability V was the figure of merit 
optimized at each node. (Other figures of merit could 
of course have been chosen to drive the optimization, 
depending on the application at hand.) This arbitrary 
choice, in addition to the finite sampling of the parame¬ 
ter ranges T, means that the figures of merit presented 
here are under-estimated and that the implementation of 
a global optimization algorithm with a denser parameter 
sampling is bound to demonstrate better performances. 


1. Hadamard rotation and APD 

Let us for now set C = 2 so that the candidate states 
are those of Eq. ( [M| ). The first combination we shall 
simulate comprises and at each partial 

measurement (Fig. [^. Under a direct measurement set¬ 
ting A^ = 1, the particularity of this combination is that 
it satisfies exactly the similarity transformation ([^ pro¬ 
vided we set To = ±1 and ij = 100%: 


^^^Had.)^^(APD) ^(_Had.) ^ 

Such a combined effect of and 14^"^^°^ is thus 

equivalent to that of an ideal POVM and there would 
therefore be no need to resort to a generalized measure¬ 
ment with N > 2. This can be seen in Fig. where 
all the figures of merit with rj = 100% are readily sat¬ 
urated at their global optima and do not exhibit any 
A^-dependence. In practice, however, not only does there 
not exist any trivial laboratory setup which implements 
^ even if there were, the quantum efficiency of 
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the APD will likely be much less than unity. This non¬ 
unit quantum efficiency causes a degradation in all figures 
of merit. For example, simply going from 77 = 100% to 
rj = 90% decreases the Bell factor from its maximum at 
B = 0.25 to an inconclusive B — 0 while the discrimina¬ 
tion error £ jumps from zero to 5%. With this in mind, it 
is interesting to see whether a generalized measurement 
can compensate for the lower quantum efficiency. This 
turns out to indeed be the case: The overall trends of 
all figures of merit improves with an increased N. For 
instance, the discrimination error of a generalized mea¬ 
surement with N = 5 and rj — 70% is as good as a direct 
measurement N = 1 with a higher 77 = 80%. 


2. Coherent displacement and APD 

If, instead of a Hadamard rotation, we use a coherent 
displacer ^ we obtain the trends in Fig. For 

being a realistic device, the coherent displacer—unlike 
^pjad.)—achieve a perfect similarity transforma¬ 
tion to the ideal POVM. Indeed, none of the figures of 
merit are ideal for the direct measurement iV = 1 even if 
we do see a violation of Bell’s inequality for 77 = 100%. 
(Bell tests with a combination of and were 

discussed at length in Refs. [321 ESI EZ]-) By imple¬ 
menting generalized measurements, all figures of merit 
improve. This is particularly apparent for the discrim¬ 
ination error for which a generalized measurement with 
TV = 4 and 77 = 80% is even slightly better than a direct 
measurement with ideal quantum efficiency. To re-phrase 
this surprising result with a classical metaphor, it is as if 
four myopic eyes could discern different objects as well— 
if not better—than a single eye of perfect vision. 


3. Homodyne detection 


Our simulations with a homodyne detector 
clearly show that there is nothing to be gained by gener¬ 
alized measurements and that, on the contrary, all figures 
of merit suffer a degradation with higher N. These trends 
are shown in Figs, [^andj^ This, however, may simply 
mean that that the quadrature binning of Eqs. (46) and 
(47) is not adapted to the problem or that conjugate 
quadratures also need to be probed. 


4 ■ Distinguishability as a function of M and C 


the decision tree—now made up of three-pronged nodes 
(M = 3)—scatters into too many leaves with too little 
statistical value. This problem is referred to as fragmen¬ 
tation and we shall come back to it in Sec. II V Cl The 
response of the figures of merit to an increased output 
cardinality M is an interesting topic in its own right but 
we shall not expand on it here. The same goes for the 
behavior of the figures of merit with respect to a larger 
pool of candidate states. Figure displays the distin¬ 
guishability for different values of C as a function of N 
using and (Recall how the candidate 

states are sampled from the Bloch sphere in Eq. (39).) 
An improvement with N is mostly witnessed for (7 = 2, 
whereas larger pools (7 > 4 do not exhibit any sensible 
improvement with increased N. Bere again, we should 
leave open the possibility that a different set of operations 
could improve the results even for larger (7. (The stag¬ 
nation witnessed for (7 > 4 cannot be solely due to the 
non-orthogonality of the candidate states since the plot 
with (7 = 3, which also features non-orthogonal states, 
does improve with N.) 


C. Computational considerations 


We already mentioned the heavy developmental cost 
that is incurred by the exponential growth of the decision 
trees. Even with parallelized simulations, most of our 
calculations were too demanding to implement beyond 
N ^ 10. The problem is compounded if one includes von 
Neumann projectors with more than two possible out¬ 


(Dii 


and 


comes. For example, our runs with At 
had to be aborted already at A^ = 8 (cf. Fig. 10). It 
is therefore worth mentioning a few considerations that 
cut down or at least help us evaluate the computational 
costs. 

Consider for example dynamic programming, which is 
one of the methods that could achieve a global optimiza¬ 
tion of the Bayesian network. In this case, all possible 
combinations of parameters r in the range T need to be 
tested for all internal nodes. The run time complexity is 
then given by 


T{N) G O 



(50) 


where Sj- is the number of parameter points sampled 
from the parameter range. If we are to opt for greedy al¬ 
gorithms instead, the run time complexity is significantly 
reduced to 


10 


make use of and Ap’’* ^ re- 


Figures and 

spectively, except that instead of an APD, we employ 
a photon number resolving detector which saturates at 
ris = 2. Although the overall trend is the same as that of 
Figs. I^an di the figures of merit do not perform as well, 
except for the distinguishability (Fig. [TT]). This might be 
due to the fact that for only two candidate states ((7 = 2), 


l\/fN _ 1 

m) cx e O (M^), (51) 

where we assume that the algorithm operates as a single¬ 
pass traversal of the tree. One way to get rid of the 
exponential dependence on N is to build a dictionary of 
decisions which can be recycled at each node. This dictio¬ 
nary consists of a (7-dimensional table of all the possible 
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FIG. 5: The figures of merit as a function of N using a Hadamard rotator and an APD with different quantum 

efficiencies rj. The pool of candidate states consists of the two qubits of Eq. (|34[). 
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Distinguishability [%] Bell factor 13 Mean min-to-max ratio “TZ [%] Discrimination error S [%] 

FIG. 6: The figures of merit as a function of N using a coherent displacer and an APD with different quantum 

efficiencies r;. The pool of candidate states consists of the two qubits of Eq. (|34[). 


states on the entire Bloch ball that the qubits can deco¬ 
here into. For any such combination of possible states, 
the entries of the table will store the parameter Tq which 
optimizes the figure of merit. This technique is of course 
demanding in its preparation stage as the entire Bloch 
ball will have to be discretized into sufficiently many C- 
tuples of sample points. However, it represents the best 
way to scale the problem without the exponential cost on 
N. 


Another computational overhead is due to the frag¬ 
mentation of the Bayesian network [25] . This is the pro¬ 
cess whereby vast swaths of the decision tree yield little 
statistical significance and yet take up as much resources 
to compute as the more relevant branches. This is illus¬ 
trated in Fig. In the case of the top-down greedy 
algorithm we have implemented, this could have been 
averted by aborting the recursion as soon as the statis¬ 
tical significance of a given node, falls below a 

certain threshold, or alternatively, if the figure of merit 
fails to converge at a satisfactory rate from one level to 
the next. Such a resource management technique would 
result in unbalanced trees, whose leaves may not all lie 
at the deepest level k = N. 


V. DISCUSSION AND OUTLOOK 

By the present work, we have tried to bring some clar¬ 
ity and structure to the design and optimization of gen¬ 
eralized measurement. We have stated the problem as 
follows: Given a selection of variable unitary operations 
and von Neumann projectors, how can can we assemble 
them so as to optimize certain figures of merit resulting 
from the measurement? We have built up an algebraic 
answer to this question in three stages. The first recog¬ 
nizes that the variable unitary operations can be used to 
emulate, albeit approximately, a similarity transforma¬ 
tion between the von Neumann projector and the ideal 
POVM. I.e., we have reduced the notion of measurement 
to a diagonalization problem. The second stage extends 
this idea to higher dimensions as per Neumark’s theorem, 
thereby providing better control over the interaction be¬ 
tween the measured states and the measuring operator. 
This is what we referred to as adaptation in Hilbert space. 
The sequential use of weak measurements came into ef¬ 
fect in the third and final stage where we dealt with the 
time-dependence of adaptation. This aspect has been 
extensively treated under various formulations such as 
stochastic Schrodinger equations [HI |48j , quantum filter¬ 
ing equations |5], or Markov filtering equations [7]. We 
instead opt to for a probabilistic graphical model—i.e., a 
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I ' tl- 50 % ' -x- tl-60% ■ -O- 'tl- 70 % - B - 9 ->l- 90 % — e— 100% 




Distinguishability "D [%] Bell factor B Mean min-to-max ratio IZ [%] Discrimination error S [%] 

FIG. 7: The figures of merit as a function of using a Hadamard rotator ^(Had .) ^ homodyne detector of unit 

quantum efficiency. The pool of candidate states consists of the two qubits of Eq. (|34[). 
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FIG. 8: The figures of merit as a function of N using a coherent displacer and a homodyne detector of unit 

quantum efficiency. The pool of candidate states consists of the two qubits of Eq. (34 b 


Bayesian network—to represent the optimization of mea¬ 
surement under a gradual collapse scenario. More specif¬ 
ically, we use a class probability tree whose leaves repre¬ 
sent measurement outcomes and are weighted by prob¬ 
ability distributions. The various branches fl through 
which the candidate states “trickle down” from the root 
to the leaves can then be used to retrodict the state iden¬ 
tities c. Furthermore, the parameter settings stored in 
every node of the tree can be looked up by the experi¬ 
mentalist to determine the optimal measurement settings 
To as the data acquisition unfolds in real time. 

Quantum information holds many promises for the fu¬ 
ture of computation. At the present time, however, it 
may rather be classical computer science, and specifi¬ 
cally machine learning, which is more likely to advance 
quantum measurement protocols. This is what tran¬ 
spires from the second part of this article where we have 
touched on the various algorithms with which the class 
probability trees are built. For simplicity, we have used 
the straightforward greedy approach. It is however clear 
that the most general measurements will require an algo¬ 
rithm design in their own right. Indeed, we have simpli¬ 
fied the problem by choosing an equal beam splitting and 
we left the ancillary modes empty. In addition, the same 
pairs of unitary operators and von Neumann projectors 
were recycled in all N stages. Such assumptions for the 


sake of simplicity need not hold in the general case as 
we could conceive of a much more elaborate multiplex¬ 
ing of different combinations of unitaries and projectors, 
as well as asymmetric couplings with complex ancillary 
states (e.g., squeezed light [H]). Even the bundling of 


the generalized outcomes as in Eqs. (31) and (32), or 
that of the direct outcomes as in Eqs. (46) and ([^) can 


be modified to better serve the figure of merit. In brief, 
each of these additional degrees of freedom—while lever¬ 
aging more control over the measurement—introduce an 
extra layer of complexity in the optimization algorithms. 
As for the very construction of the Bayesian network, we 
have presented followed a top-down flow of the tree struc¬ 
ture which replicates the chronological order of quantum 
collapses. There remains to investigate whether differ¬ 
ent graph configurations with, say, a cyclic layout, would 
present any benefits. 

Einally, we have plotted how the figures of merit re¬ 
spond under different configurations and came to the con¬ 
clusion that, for most of the setups we have tried, gener¬ 
alized measurement offer a distinct advantage over direct 
measurements. (Pending further investigation, the case 
where it did not, i.e. homodyning, may simply be due to 
a poor choice of the quadrature binning or of parameter 
range, rather than to any shortcoming of the adaptation 
per se.) Overall, these results are particularly promising 
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FIG. 9: The figures of merit as a function of N using a Hadamard rotator ) and a PNRD with different 

quantum efficiencies 77 . The PNRD has three possible outcomes M = 3 and saturates at ris = 2 photon counts. The pool of 
candidate states consists of the two qubits of Eq. (34 1 . 
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FIG. 10: The figures of merit as a function of N using a coherent displacer ^ PNRD n(PNRD) different 

quantum efficiencies rj. The PNRD has three possible outcomes M = 3 and saturates at ris = 2 photon counts. The pool of 
candidate states consists of the two qubits of Eq. (341. The simulations took an inordinate amount of time beyond N = 7 
and had to be aborted. This is due to the combined demand in memory resources from the coherent displacer, which occupies 
larger matrices in Fock space than the qubit rotator, and the higher exponential growth of the decision tree with the M = 3 
instead of M = 2 . 


in light of the fact that a scaling to larger de-localizations 
N compensates for lower quantum efficiencies. This is a 
crucial advantage over direct measurements where quan¬ 
tum efficiency is an irremediable hindrance. There re¬ 
mains of course to further analyze the asymptotic behav¬ 
ior with N in order to see if the figures of merit saturate 
before reaching their theoretical optima. 
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Appendix A: Superoperators 

The coupling of the zeroth mode with the fcth mode, 
followed by the transformation and the collapse of that 
kth mode, transforms the incoming state into pi^'^ . 

Instead of having to carry around partial traces, we can 


represent the whole transformation of the zeroth mode 
as a superoperator acting on (Fig. |^. This trans¬ 

formation is 

= Tr,„e{AT 

Jai) 

If we write 

k = (g) ^^ G) T('‘)) A('=) 

= (g Bpk), (A2) 
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FIG. 11: Dependence of the distinguishability T> on the num¬ 
ber M of measurement outcomes at each node. Each mea¬ 
surement consists of a coherent displacer ) and a PNRD 

n(PN^D) saturates at 1, 2, and 3 photons, respectively. 

The pool of candidate states consisted of the two qubits of 
Eq. (341. In order to bring up any differences in photon 
counts which could be revealed by a higher photon count, we 
have extended the parameter range for the displacement to 
T = [—2, 2] instead of the shorter segment [—1,1] used so far. 
We can clearly see that the distinguishability improves with 
M although the same cannot be said of the other figures of 
merit in this particular example (cf. Fig. 101. 
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FIG. 12: Dependence of the distinguishability V on the num¬ 
ber C of candidate qubits in the pool dehned in Eq. (39l. 
Each measurement consists of a coherent displacer and 

an APD nAPD) Qf quantum efficiency. For this partic¬ 
ular combination, the generalization of the measurement to 
N > 2 mostly benefits pools made up of two or three qubits. 


where the Dirac bras and kets only apply to the ancillary 
mode. 
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